Cluster daily environmental drivers to assess how they converge to drive spring runoff patterns from east-side Mt Mansfield watersheds
# First we're going to filter the data to focus on March-May dynamics
# We want to avoid incorporating too much of the climate pattern, which would wash out this spring signal
# We'll want to play with the size of this window
# For reference water year days-of-year: Mar 15 = 166; Apr 1 = 183; May 1 = 213; May 15 = 227; May 31 = 243
mets_sub <- mets %>%
# filter(doy_wyear >= 160 & doy_wyear <= 250)
filter(doy_wyear >= 170 & doy_wyear <= 245)
We eventually want to look at differences between cumulative runoff yields at the end of the water year so let’s see if cumulative runoff yields at the end of the analysis window correlate with those at the end of the water year
## [1] r = 0.88
## [1] p = 0
## [1] r = 0.86
## [1] p = 0
If two variables are strongly correlated (negatively or positively) they can effectively “double-weight” a particular factor important in driving clustering; thus, keep just one of the variables to serve as a proxy for that factor
# For the analysis we only need one watershed/site of data, because the drivers we use are the same for both sites
mets_sub_1site <- mets_sub %>%
filter(site == "WB")
# Select vars to keep Step 1
# List the ID and response variables that will NOT be used in the analysis
drop.info <- names(mets_sub_1site %>%
select(# Remove the non-numerical ID/INFO
site, wyear, date, doy_wyear,
# Remove response variables
q_mm, q_mm_cum, q_mm_per,
# Remove other variables we know we won't want for SOM, but will want for graphing results later
precip_mm_cum, precip_mm_per))
# List the other variables that will not be used in this analysis
drop.vars <- names(mets_sub_1site %>%
select(temp_max, temp_min,
# We'll keep the average temps measured at VWC/FEMC station, so drop those from Morrisville/Stowe airport
temp_avg,
# Let's just use the cumulative total volume of water diverted from the stream by the resort
use_snow_mm_daily, use_total_mm_daily,
# After looking at an analysis, I'm going to drop the use categories altogether,
use_total_mm_cum))
# Look at correlations between variables
mets_sub_1site %>%
# Remove the vars listed above
select(-one_of(drop.info, drop.vars)) %>%
# Keep only complete observations/rows (no NAs in any of the columns)
na.omit() %>%
# Visualize these
lares::corr_cross(max_pvalue = 0.05, top = 20)
Not too worried about correlations
We’re only using complete observations/rows (no NAs in any columns)
According to the heuristic rule from Vesanto 2000, number of grid elements/grid size/nodes = 5 * sqrt(n)
To determine the the shape of the grid (ratio of columns to rows), we use the ratio of the first two eigen values of the input data set as recommended by Park et al. 2006
## [1] No. of complete observations: 1444 out of 1444 observations
## [1] No. of Vesanto nodes: 190
## [1] Ratio of columns to rows: 1.3
Code courtesy of Kristen Underwood (hidden)
## [1] Topology: hexagonal
## [1] Data normalization method used: L2norm
## [1] Weighting method used: noPCA
## [1] No. of iterations: 1500
## [1] alphaCrs used: 0.05
## [1] alphaFin used: 1500
We want to maximize npF (ratio of b/w cluster variance) and minimize QE (mean distance b/w each data vector & best-matching unit)
Here are the top 50% of runs based on npF
## [1] These were the top runs for each high-performing cluster #:
| Run | rows | cols | Nodes | Clusters | npF | QE |
|---|---|---|---|---|---|---|
| 9 | 12 | 16 | 192 | 3 | 1767.881 | 0.01 |
| 17 | 13 | 15 | 195 | 4 | 1716.028 | 0.01 |
| 22 | 13 | 15 | 195 | 5 | 1592.269 | 0.01 |
To examine this run in greater detail (e.g., component planes), see the ‘X_SOMplots_site_ … .pdf’
## [1] R2 = 0.12
## [1] p = 0.078
## [1] R2 = -0.04
## [1] p = 0.57
## [1] R2 = -0.05
## [1] p = 0.741
## [1] R2 = -0.03
## [1] p = 0.505
Here I use a running mode to identify when the switch occurs from predominantly cluster 1 to 2 or 4 (cold to warm transition)
Transition is plotted as the vertical black line
Note: the running mode (width = 3) worked well for all years but 2017; I shifted that one to first occurrence of cluster 2
## [1] R2 = 0.14
## [1] p = 0.063
## [1] R2 = 0.07
## [1] p = 0.147
## [1] R2 = 0.15
## [1] p = 0.055
## [1] R2 = 0.02
## [1] p = 0.244
## [1] R2 = 0.08
## [1] p = 0.121
## [1] R2 = -0.01
## [1] p = 0.383
## [1] R2 = 0.21
## [1] p = 0.029
## [1] R2 = 0.37
## [1] p = 0.004
## [1] R2 = 0.08
## [1] p = 0.132